Speech synthesis by structured segments, using temporal decomposition and a glottal excitation

نویسندگان

  • Frédéric Bimbot
  • Gérard Chollet
  • Paul Deléglise
چکیده

Classical speech synthesis systems either concatenate diphone-like tabulated pattems or reconstmct speech parameters according to pre-defmed mles. Both techniques show drawbacks : the fonner lacks flexibility while the lauer is highly time-consuming_ to built. We propose an intennediate technique using structured segments : segmental units are still resorted to, but they are automatically analysed in terms of a set of spectral targets, a temporal decomposition pattem and a parametric glottal excitation. Structured segments can then be handled by rules. They also supply a valuable material which can be refered to, for building gradually a reconstructive synt.'tesis system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis

Recent studies have shown that text-to-speech synthesis quality can be improved by using glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal excitation and vocal tract, that occur in the human speech production apparatus. Current glottal vocoders generate the glottal excitation waveform by using deep neural networks (DNNs). However, the squared error-b...

متن کامل

Voiced speech as secondary response of a self-consistent fundamental drive

Voiced segments of speech are assumed to be composed of non-stationary acoustic objects which can be described as stationary response of a non-stationary fundamental drive (FD) process and which are furthermore suited to reconstruct the hidden FD by using a voice adapted (self-consistent) parttone decomposition of the speech signal. The universality and robustness of human pitch perception enco...

متن کامل

Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be particularly promising in the next ye...

متن کامل

Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis

To preserve shape-invariance when pitch or time-scale modifying sinusoidally modelled voiced speech, the phases of the sinusoids used to model the glottal excitation are made to add coherently at estimated excitation points. Previous methods achieve this by estimating excitation phases at synthesis frame boundaries, disregarding the frequency modulation that may occur between the frame boundary...

متن کامل

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks

This work studies the use of deep learning methods to directly model glottal excitation waveforms from context dependent text features in a text-to-speech synthesis system. Glottal vocoding is integrated into a deep neural network-based text-to-speech framework where text and acoustic features can be flexibly used as both network inputs or outputs. Long short-term memory recurrent neural networ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989